Merging Blocks and TAZs

Author

Sophie Fox and Steven Andrews

Published

February 6, 2023

TODO:

Notes:

General Goal

The TDM outputs results at a TAZ level. However, for equity analyses, we need to allocated census demographics to those TAZs–the model does not output results based on minority status, income status, or any other status that might be interesting for an equity analysis.

Census geometry does NOT fit neatly into TAZ geometry. We would like to allocate pieces of census demographics to surrounding TAZs to create demographics of the TAZs. We need to create a table that we can use to allocate portions of census geometry into TAZs.

Problems:

  • TAZs are mostly made up of blocks, but they’re not exactly made up of blocks.
  • There are small tweaks to the TAZ geometry throughout the model region that mean lines do not overlap exactly.
  • Census blocks are NOT necessarily the best starting point because the differential privacy features of the 2020 census have rendered them unreliable.
  • TAZs may be smaller than a census geometry.

Constraints:

  • TAZ geometry may change over time. The process should be repeatable for arbitrary TAZ geometry (this essentially means for ANY geometry).

  • We want the process to be maintainable over multiple ACS iterations.

Other Goals

  • Create multiple methods to compare the sensitivity to methodology.

Potential Solutions

When joining 2010 census blocks to TAZs, Paul Reim performed a series of intersections to identify where blocks were split. Where a TAZ split a block, he counted rooftops and used StreetView to estimate how many rooftops were in each TAZ. It is not desirable to maintain such a process–the reproducibility is limited and its highly manual. We can replicate the methodology using the rooftops layer. There are other solutions available as well:

  1. Rooftops
    1. Allocate based on where the largest piece of the rooftop is
    2. Where the centroid of the roof is
    3. Allocate proportionally based on the rooftop area.
  2. Land Area
  3. Total Area
  4. Dasymetric Mapping

Useful references

Set up Packages

library(tidyverse)
library(tidycensus)
library(tigris)
library(mapview)
library(sf)
library(units)

Load Census, TAZ, and Rooftop Shapes

#Load TAZ shapes
TAZ <- sf::read_sf("J:/Shared drives/Data_Requests/TAZ_Shapefiles/R_Script/out/geopackage/TAZ19.gpkg") %>% 
  st_transform(26986) %>% 
  filter(STATE == "MA") # Filtering to only in MA

TAZ$TAZ_area <- st_area(TAZ)
  #filter(ID == 4398) # Filtering for one example

# OBtain the 2020 block_groups from the tigris package.
# The CB false (the more detailed) geometry looked like it matched up better, so we move forward with that path.
block_groups <- tigris::block_groups(state = "MA", 
                                cb = FALSE, 
                                year = 2020) %>% 
  st_transform(26986)

# Clip the block_groups to match the geometry of the TAZ file. This will eliminate coastal waters but retain inland waters. This is a useful starting place to have a similar coast line. This will also make our `erase_water` function work on a more similar set of water.

block_groups_nocoast <- 
  st_intersection(block_groups, TAZ |> summarize()) |> 
  st_collection_extract("POLYGON")
Warning: attribute variables are assumed to be spatially constant throughout all
geometries
# Review the differences. It looks like what we'd expect. Coastal waters are not in the dataset anymore. 
# mapview(block_groups) + 
# mapview(block_groups_nocoast , col.regions = "orange")

# Overwrite the original file with the no_coast version and clean up the workspace.
block_groups <- block_groups_nocoast
rm(block_groups_nocoast)

#massgis rooftops: https://www.mass.gov/info-details/massgis-data-building-structures-2-d
# Download the file into "./data/base/" and then read it in here. Save it as a .rds for easy loading. Optionally, we could 
# rooftops <- st_read("data/base/STRUCTURES_POLY.shp")
# 
# saveRDS(rooftops, "data/processed/rooftops.rds")


# Load Population Census Data
# downloaded from: https://data.census.gov/table?t=Race+and+Ethnicity&g=0400000US25$1500000&tid=DECENNIALPL2020.P1
minority <- read_csv("data/base/DECENNIALPL2020.P1-Data.csv")
New names:
Rows: 5117 Columns: 145
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(144): GEO_ID, NAME, P1_001N, P1_001NA, P1_002N, P1_002NA, P1_003N, P1_0... lgl
(1): ...145
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...145`
# P2_005N: Non-Hispanic/Latino, White-alone.
# P1_001N: Total Population.
minority_tidy <- tidycensus::get_decennial(
  geography = "block group",
  state = "MA",
  variable = "P2_005N",
  summary_var = "P1_001N",
  year = 2020)
Getting data from the 2020 decennial Census
Using the PL 94-171 Redistricting Data summary file
Note: 2020 decennial Census data use differential privacy, a technique that
introduces errors into data to preserve respondent confidentiality.
ℹ Small counts should be interpreted with caution.
ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.

Only Looking at Plymouth County

We start by filtering out for just Plymouth County to keep the problem small while we work through the process.

# ply = Plymouth.
TAZ_ply <- TAZ %>% 
  filter(COUNTY == "Plymouth MA")

# 023 = Plymouth
# See: tidycensus::fips_codes |> filter(state == "MA", county_code == "023")
blk_group_ply <- block_groups %>% 
  filter(COUNTYFP == "023")

Reordering The Steps

Find Area of census block groups

blk_group_ply$BG_TOTAL_AREA <- st_area(blk_group_ply)

Intersect census block groups with TAZ

intersections <- st_intersection(TAZ_ply, blk_group_ply) %>%
  st_collection_extract("POLYGON") %>% 
  rowid_to_column("Intersection_ID") # unique ID for each intersection geometry
Warning: attribute variables are assumed to be spatially constant throughout all
geometries
intersections$INTERSECTION_TOTAL_AREA <- st_area(intersections)

Erase Water From Intersections

intersections_land <- tigris::erase_water(intersections, 
                                area_threshold = 0.5, 
                                year = 2020)
Fetching area water data for your dataset's location...
Erasing water area...
If this is slow, try a larger area threshold value.

Recalculate Land Area for the Intersections

intersections_land$INTERSECTION_LAND_AREA <- st_area(intersections_land)

Find Total Land Area for Census Block Groups

intersections_land <- intersections_land %>% 
  group_by(GEOID) %>% #grouping by block group
  mutate(BG_LAND_AREA = sum(INTERSECTION_LAND_AREA))

library(ggplot2)
ggplot(intersections_land, aes(x = BG_LAND_AREA, y = ALAND)) + geom_smooth(method = "lm") + geom_point() + coord_equal() +  geom_abline(slope = 1) 
`geom_smooth()` using formula 'y ~ x'

Example

example <- intersections_land %>% 
  filter(ID == 1759)

mapview(TAZ_ply |> 
          filter(ID %in% example$ID), 
        col.regions = "red", 
        layer.name = "TAZ") + 
  mapview(blk_group_ply |> 
            filter(GEOID %in% example$GEOID),
          alpha.regions = 0.01, lwd = 3,
          layer.name = "CBG") + 
  mapview(example, zcol = "Intersection_ID",
          layer.name = "Intersection")

Allocate based on rooftops

Determine the Number of Rooftops in Each Intersection Area

We pull the pre-stored rooftops file in and show an example. Feel free to see how the file matches the footprints in the OSM layer and the aerial satellite layer.

# rooftops.rds reated previously.
rooftops <- read_rds("./data/processed/rooftops.rds")
mapview(rooftops |> filter(STRUCT_ID == "53226_895084"))

Looking at repeated and NA IDs for rooftops

# Find the IDs of repeats and how many times they are repeated
repeats <- rooftops |> st_drop_geometry() |> group_by(STRUCT_ID) |> tally() |> filter(n>1)
View(repeats)

# Find the rooftops with unique STRUCT_IDs
unique_roof <- rooftops %>% filter(!(STRUCT_ID %in% repeats$STRUCT_ID)) %>% 
  select(STRUCT_ID, geometry)
# Find the rooftops with repeated or NA STRUCT_IDs
repeat_roof <- rooftops %>% filter(STRUCT_ID %in% repeats$STRUCT_ID) %>% 
  select(STRUCT_ID, geometry)

# All NAs will be replaced by a new id
repeat_roof <- repeat_roof |> 
  mutate(STRUCT_ID = coalesce(STRUCT_ID, paste0(row_number(),"_replaced_na") ) )

# Summarizing to get new geometry
repeat_roof <- repeat_roof %>% 
  group_by(STRUCT_ID) %>% 
  summarize() 

mapview(repeat_roof)
# combining the rooftops with unique ids with new summarized roof geometries
final_roofs <- bind_rows(unique_roof, repeat_roof)

# Some of the repeats are where there are buildings for schools, like UMass Boston
# Others are where there are multiple polygons that have the same struct ID and just look like a main building with an extra attachment.  

# Option 1: summarize repeats and add back into no duplicates
# Option 2: Go through manually and take the largest

#NA_rooftops <- rooftops %>% 
#  filter(is.na(STRUCT_ID) == TRUE)
#x <- head(NA_rooftops,1)
#mapview(x)
# NA struct IDs seem to be random

We now find which rooftops are in which intersection area.

#intersecting the rooftops with the census block groups
#roofs_in_bg <- st_intersection(blk_groups_land, rooftops)

# finding the number of rooftops in each census block group
# num_rooftops <- roofs_in_bg %>% 
#   group_by(GEOID) %>% 
#   mutate(num_roof = n()) %>%  
#   distinct(GEOID, num_roof)
# saveRDS(roofs_in_bg, "data/processed/rooftops_in_bg.rds")
# saveRDS(num_rooftops, "data/processed/num_rooftops.rds")


intersections_land <- intersections_land %>% 
  select(Intersection_ID, ID, GEOID, BG_TOTAL_AREA, INTERSECTION_TOTAL_AREA, INTERSECTION_LAND_AREA, BG_LAND_AREA)

# Joining the intersection areas with the rooftops that reside inside of them.
# 
# See this thread for the basic idea. The default join for 
# st_join is st_intersects, this is basically intersect the two and choose pull the info of the largest intersection area into the rooftop dataset.
# https://github.com/r-spatial/sf/issues/578


# NOTE: have to rethink the join to account for total bg area as opposed to the largest intersection area (see Jamboard)
# https://jamboard.google.com/d/1L5YrUrmOUpmQp8X6w2gaz51Fz6We8b0uV-Hm5g4aU2E/viewer?ts=63d15e99&f=1
# This took a very long time
roofs_in_intersect <- st_join(
  final_roofs,  
  intersections_land,
  left= FALSE, # inner join
  largest = TRUE)
Warning: attribute variables are assumed to be spatially constant throughout all
geometries
# Here's an example of what this is doing.
roofs_in_intersect_filt <- roofs_in_intersect |> 
  filter(ID %in% c(2708, 2709)) |> 
  mutate(Intersection_ID = as.character(Intersection_ID))

mapview(roofs_in_intersect_filt, zcol = "Intersection_ID") +
  mapview(intersections |> 
            filter(ID %in% roofs_in_intersect_filt$ID), 
          alpha.region = 0.01, layer.name = "Intersection")
num_rooftops <- roofs_in_intersect %>% 
  st_drop_geometry() %>% 
  group_by(Intersection_ID) %>% 
  mutate(num_roof = n()) %>% 
  distinct(Intersection_ID, ID, GEOID, BG_TOTAL_AREA, BG_LAND_AREA, INTERSECTION_TOTAL_AREA, INTERSECTION_LAND_AREA, num_roof)

saveRDS(roofs_in_intersect, "data/processed/rooftops_in_intersect.rds")
saveRDS(num_rooftops, "data/processed/num_rooftops_intersect.rds")

Looking to see how many rooftops are in multiple intersections

roofs_in_intersect2 <- st_join(
  final_roofs,  
  intersections_land,
  left= FALSE) %>% 
  group_by(STRUCT_ID) %>% 
  mutate(num_intersections = n()) 

roofs_in_intersect2 <- roofs_in_intersect2 %>% 
  distinct(STRUCT_ID, num_intersections) %>% 
  filter(num_intersections > 1)

# 4 intersections: 6 rooftops
# 3 intersections: 143 rooftops
# 2 intersections: 104 rooftops
# Percent of rootops (240492) that are in 3+ intersections: 0.06%

Determine the percentage of population that belongs to each intersection in the TAZ based on the number of rooftops

# Pull in the rooftops that were calculated in the last step.
roofs_in_intersect <- read_rds("./data/processed/rooftops_in_intersect.rds")

num_rooftops <-  read_rds("data/processed/num_rooftops_intersect.rds")

# Calculate the share of the GEOID in each TAZ.
rooftop_share <- num_rooftops |> 
  group_by(GEOID) |>  
  mutate(rooftop_pct = num_roof/sum(num_roof))

write_csv(rooftop_share, "./output/data/rooftop_share_plymouth.csv")
# Get population for block groups
census_pop <- minority_tidy %>% 
  select(c(GEOID, summary_value)) %>% 
  rename(population_BG = summary_value)

census_pop$GEOID <-str_remove_all(census_pop$GEOID, "1500000US")

pop_percent <- rooftop_share %>% 
  left_join(census_pop) %>% 
  # per BG: get population in each intersecting TAZ
  mutate(population_intersection = rooftop_pct * as.numeric(population_BG)) %>% 
  group_by(ID) %>% 
  # per TAZ: add up the population in each intersecting BG
  mutate(population_TAZ_roof = sum(population_intersection)) %>% 
  ungroup()
Joining, by = "GEOID"
TAZ_pop_rooftops <- pop_percent %>% 
  distinct(ID, population_TAZ_roof)

TAZ_plot <- TAZ_ply %>% 
  left_join(TAZ_pop_rooftops)
Joining, by = "ID"
mapview(TAZ_plot, zcol = "population_TAZ_roof")

Allocate based on area

Land Area only

This performs an intersection and allocation between CBGs and TAZs based on land area (major water features are removed).

# Create the land area allocation table for export.
land_area_alloc <- intersections_land %>% 
  group_by(GEOID) |> 
  # per BG: get percentage of GEOID land area that is in each intersection
  mutate(BG_LA_pct = drop_units(INTERSECTION_LAND_AREA / BG_LAND_AREA))

write_csv(land_area_alloc, "./output/data/landarea_share_plymouth.csv")

land_areas <- land_area_alloc %>% 
  left_join(census_pop) %>% 
  # per BG: get population in each intersecting TAZ
  mutate(population_intersection = BG_LA_pct * as.numeric(population_BG)) %>% 
  group_by(ID) %>% 
  # per TAZ: add up the population in each intersecting BG
  mutate(population_TAZ_LA = sum(population_intersection)) %>% 
  ungroup()
Joining, by = "GEOID"
TAZ_pop_LandArea <- land_areas %>% 
  distinct(ID, population_TAZ_LA)

TAZ_plot <- TAZ_plot %>% 
  left_join(TAZ_pop_LandArea)
Joining, by = "ID"
mapview(TAZ_plot, zcol = "population_TAZ_LA")

Total Area

This performs an intersection and allocation between CBGs and TAZs based on all area, including water area.

# Create the full area (land + water) allocation table for export.
tot_area_alloc <- intersections_land %>%
  group_by(GEOID) |> 
  # per BG: get percentage of all area that is in each intersecting TAZ
  mutate(BG_AA_pct = drop_units(INTERSECTION_TOTAL_AREA / BG_TOTAL_AREA))

write_csv(tot_area_alloc, "./output/data/allarea_share_plymouth.csv")

tot_areas <- tot_area_alloc %>% 
  left_join(census_pop) %>% 
  # per BG: get population in each intersecting TAZ
  mutate(population_intersection = BG_AA_pct * as.numeric(population_BG)) %>% 
  group_by(ID) %>% 
  # per TAZ: add up the population in each intersecting BG
  mutate(population_TAZ_AA = sum(population_intersection)) %>% 
  ungroup()
Joining, by = "GEOID"
TAZ_pop_TotArea <- tot_areas %>% 
  distinct(ID, population_TAZ_AA)


TAZ_plot <- TAZ_plot %>% 
  left_join(TAZ_pop_TotArea)
Joining, by = "ID"
mapview(TAZ_plot, zcol = "population_TAZ_AA")
# Pct Difference between the two
diff <- TAZ_pop_TotArea %>% 
  left_join(TAZ_pop_LandArea) %>% 
  mutate(pct_diff = population_TAZ_AA / population_TAZ_LA )
Joining, by = "ID"
ggplot(diff, aes(x = pct_diff)) +
  geom_histogram(binwidth = 0.25) +
  scale_x_continuous(breaks = seq(0,25,1))

hist(diff$pct_diff, 80)

Dasymetric Allocation

Join the Roofs, Land Area, All Area Methods

# Join them all down here for exploration.

joined_estimates <- TAZ_pop_rooftops %>% 
  left_join(TAZ_pop_LandArea) %>% 
  left_join((TAZ_pop_TotArea))
Joining, by = "ID"
Joining, by = "ID"
# Import the census block group pops once, then mutiply the populations here. 

Testing

Compare Population Estimates

Test minority share

Thoughts:

  • some TAZs are smaller than blockgroups. Do we need to adjust the method of analysis for those cases? It could be the same. Since we are looking for the percentage of the block group that is within the TAZ, then it wouldn’t matter if it is smaller or larger. In this case, it just wouldn’t also be sharing area with other block groups.